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Abstract 

In genetics the Moran model describes the neutral evolution of a bi-allelic gene in a population of 
haploid individuals subjected to mutations. We show in this paper that this model can be mapped 
into an influence dynamical process on networks subjected to external influences. The panmictic 
case considered by Moran corresponds to fully connected networks and can be completely solved in 
terms of hypergeometric functions. Other types of networks correspond to structured populations, 
for which approximate solutions are also available. This new approach to the classic Moran model 
leads to a relation between regular networks based on spatial grids and the mechanism of isolation 
by distance. We discuss the consequences of this connection for topopatric speciation and the 
theory of neutral speciation and biodiversity. We show that the effect of mutations in structured 
populations, where individuals can mate only with neighbors, is greatly enhanced with respect 
to the panmictic case. If mating is further constrained by genetic proximity between individuals, 
a balance of opposing tendencies take place: increasing diversity promoted by enhanced effective 
mutations versus decreasing diversity promoted by similarity between mates. Stabilization occurs 
with speciation via pattern formation. We derive an explicit relation involving the parameters 
characterizing the population that indicates when speciation is possible. 
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I. INTRODUCTION 



A basic problem in population genetics is to predict how allele frequencies change in a 
population according to the underlying rules governing reproduction. For very large pop- 
ulations the Hardy- Weinberg law applies and no change is expected between consecutive 
generations. However, for finite populations this is not necessarily true, and drift can play 
an important role. 

One of the first models to describe genetic drift in a finite population is the Wright-Fisher 
model [lj]. It considers a population of iV diploid individuals and a single gene with two 
alleles A and Ai, so that there are a total of 2N genes. Given that the number of alleles 
A\ in the population at time t is i, one can easily compute the probability to have j alleles 
A\ at time t + 1. Assuming that reproduction occurs by randomly picking 2N genes among 
the previous population with replacement and that there is no mutation, this probability is 
given by the binomial distribution 

PlJ = {i/2NYil-{i/2N)f N -K 

These transition probabilities form a matrix whose eigenvalues and eigenvectors contain 
all the information about the evolution of the system. Although the Wright-Fisher matrix 
is rather complicated, several analytical results can be extracted from it and even mutations 
can be included [lj]. 

Other models were developed later that allowed for simpler mathematical treatment than 

n 

the Wright-Fisher model or its generalization by Cannings |2J. Of particular importance is 
the Moran model [l, 3, ^], which considers haploid individuals and overlapping generations. 
Here a single hermaphroditic individual reproduces at each time step, with the offspring 
replacing the expiring parent. The transition probabilities can also be written down explicitly 
and all its eigenvalues and eigenvectors can be calculated for the case of zero mutations [5|,|6(. 
When mutations are included the eigenvalues of the transition matrix and the stationary 
prob a blllty „ ndmg to tbe flrst e lg enve ctor , can stm be Calculat ed fl £ 

Here we show that the Moran model can be mapped into a dynamical problem on net- 
works, putting this classic model of population genetics in a broader and modern perspective. 
The mapping takes a panmictic population into a fully connected network, where the dynam- 
ical problem can be completely solved in terms of generating functions [8, 9]. This provides 



a simple and elegant representation of the complete set of eigenvectors of the problem. The 
connection with the network dynamics gives, to our knowledge, the first complete solution 
of the Moran model. 

Networks that are not fully connected map into non-random mating in structured pop- 
ulations. In particular, regular networks based on two-dimensional grids relate to spatially 
structured populations where mating is allowed only between neighbors. This, in turn, pro- 
vrdes the basic mechanism ofisolation by distance, as first proposed by Sewail Wright fa. 
It has been recently shown [llj that this process can lead to speciation, termed topopatric 
speciation, and that the patterns of diversity that arise are fully compatible with the char- 
acteristics of biodiversity observed across many types of species in nature [12| . Although no 
exact solution exists for the Moran model for structured populations, approximate solutions 
do exist for the equivalent network problem {9]. In this paper we explore this connection to 
discuss the mechanisms underlying topopatric speciation [11] . 

The paper is organized as follows: in sections HT1 and [TTT1 we define the network dynamical 
system associated to the Moran process and write down its master equation and transition 
probabilities. In section HVl we show how the Moran model can be mapped into this network 
problem. In section |V] we summarize the Moran-network properties: the distribution of 
allele frequencies at equilibrium, with its mean value and variance, and the limit of large 
populations. In section I VI I we discuss approximations for other network topologies and, in 
section IVHt their consequences for speciation. 



II. THE NETWORK DYNAMICAL SYSTEM 

Networks are mathematical structures composed of nodes and links between the nodes. 
The nodes often represent parts of a system and the links the interaction between the parts. 
Networks can model a wide range of systems in biology, engineering and the social sciences 
13l | . In this work we will associate nodes to a particular gene carried by individuals in a 
population and links will be established between individuals that can mate with each other. 
In this section networks will be treated as mathematical abstractions with a particular 
dynamics of network states; the connection with population genetics will be established in 
section IIV} although the correspondence with the Moran process is going to become evident 
as we proceed. 



Consider a network with N + N + Ni nodes. To each node i we assign an internal state 
%i which can take only the values or 1. The nodes are divided into three categories: iV 
nodes are free to change their internal state (according to the rule stated below); Ni nodes 
are frozen in the state x^ — 1 and N nodes are frozen in xi = 0. The frozen nodes are 
assumed to be connected to all free nodes and we consider them as perturbations to the 
'free' network, composed of the free nodes only. The information about the free network 
topology is contained in its adjacency matrix A defined as Aij = 1 if nodes i and j are 
connected, Aij = if they are not and An = 0. We refer to the free nodes connected to % as 
their neighbors. The degree fc, = £\ Aij is the number of neighbors of node i. 

The dynamics on the free nodes is defined as follows: at each time step a node is selected 
at random to be updated. With probability p the state of the node does not change, and 
with probability 1 — p it copies the state of one of its connected nodes, selected randomly 
among the ki free neighbors or iV + Ni frozen nodes. If the node to be updated is i, then 




x\ with probability p 

x) with probability k J^ p +Nl 



where j is connected to i. 

We call this process an influence dynamics, since the state of a node changes according 
to the state of its neighbors. This system can model a number of interesting situations, such 
as, for example: 

(a) An election with two candidates where part of the voters have a fixed opinion while 
the others change their intention according to the opinion of the others. 

(b) A sexually reproducing population of N haploid individuals where the internal state 
represents two alleles of a gene. Taking p — 1/2, the update of a node mimics the mating 
of the focal individual with one of its neighbors. The focal individual is replaced by the 
offspring, which can take the allele of each parent with 50% probability. Since the free node 
can also copy the state of a frozen node, the values of N and Ni can be associated with 
mutation rates, as we will show later. 

(c) A ferromagnetic material composed of atoms with magnetic moment ±1/2 interacting 
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with an external magnetic field. 



Although the influence process is very simple, its analysis can be quite complicated for 
networks of arbitrary topology. We will first consider the simpler case of fully connected 
networks, where Aij — 1 if i ^ j, An = and ki = JV — 1. Later we will discuss the 
consequences of other topologies and provide approximate results for these cases using the 
fully connected case as a basis. 

III. MASTER EQUATION AND TRANSITION PROBABILITIES 

For fully connected networks the nodes are indistinguishable and there are only JV + 1 
global states, that we call <r fc , k = 0, 1,...,N. The state has k free nodes in the state 
1 and N — k free nodes in the state 0. There is no need to count the frozen nodes, since 
they never change. If Pt(m) is the probability of finding the network in the state a m at the 
time t then, P t +i(m) can depend only on P t (m), P t (m + 1) and Pt(m — 1), since only one 
node is updated per time step. According to the updating rule above, the dynamic of the 
probabilities is described by the following equation: 

P t+ i(rn) = P,{m) jp + +l/~+ Nl _ fj H m + M - 1) + (JV - m)(JV + N - m - 1)] 

P ' (m ~ % (j V + V + Vl) ( '" + Nl ~ 1)iN ~ m + l) + 

P ' {m + ^ N(NT0N^T) (m + + Ai " ™ " 1) ■ 

The term inside the first brackets gives the probability that the state a m does not change 
in that time step and is divided into two contributions: the probability p that the node 
does not change plus the probability 1 — p that the node does change. In latter case, the 
state of the node is Xi — 1 with probability m/N, and it may copy a different node in the 
same state, Xj = 1, with probability (m — 1 + A r 1 )/(iV + N + N ± — 1). Also, if Xi = 0, 
which has probability (JV — m)/N, it may copy another node Xj = with probability 
(JV - m - 1 + JV )/(JV + JV + JVi - 1). The other terms are obtained similarly. 

The probabilities P t (m) define a P t vector of N + 1 components. In terms of P t the above 



master equation can be written in matrix form as 

(1-p) 



P t+1 = UP t = 



-A 



N(N + N + Ni - 1) 

where the evolution matrix U, and also the auxiliary matrix A, is tri-diagonal. The non-zero 
elements of A are independent of p and are given by 

A myTn = 2m(N - m) + N X (N - m) + N m 
A m , m+l = -(m + 1)(JV + N - m - 1) 
A m>m -i = ~( N ~ m + l)(iVi + m - 1). 

These transition elements are the analogue of the Wright-Fisher transition probabilities 
described in the Introduction for the network dynamics. 

Let a r and b r be the right and left eigenvectors of U (and therefore of A) and A r the 
corresponding eigenvalues, so that Ua r = X r a r and U T b r = X r b r . The transition probability 
between two states o~m an d o~l after the time t can be written as 

N 

P(L,t;M,0)=J2 b rMa rL K- (1) 

r=0 

where a r i and b r u are the components of the right and left r-th eigenvectors. The eigenvalues 
of U are given by 

NiN + No + ^-lf* 
where \i r are the eigenvalues of A. Equation ([1]) indicates that the A r have to be smaller or 
equal to 1, otherwise P(L,t; M, 0) would eventually become larger than 1. Moreover, the 
eigenvectors corresponding to A = 1 completely determine the asymptotic behavior of the 
system, since the contributions of all the others to P(L, t; M, 0) die out at large times. 
The eigenvalues of A are given by 9] 

H r = r(r — 1 + iVo + N x ) , 

which indeed implies that < p < A r < 1. Therefore, if and only if iVo = N\ = there are 
two asymptotic (absorbing) states, corresponding to r = and r = 1, given by o~o ( & h node 
in state 0) and o~n (all nodes in state 1). Otherwise there is only one possible asymptotic 
state, corresponding to r = 0. All other eigenvectors, related to the transient dynamics, 
can be calculated explicitly in terms of hypergeometric generating functions [9]. We do not 
write them down here because we are only interested in equilibrium properties. 



IV. MAPPING THE MORAN MODEL ONTO NETWORK DYNAMICS 



In order to map the evolution of a panmictic population of N hermaphroditic individuals 
into the fully connected network problem described above we use the following notation: we 
associate Xi to the allele of the haploid individual i, which is either for allele A or 1 for 
allele A\. At each time step a random individual % is chosen to reproduce, and a random 
mate j is selected among the remaining N — 1 individuals. The focal individual % is then 
replaced by the offspring. 



Reproduction is carried out in two steps. The first step is the sexual reproduction itself: 
with probability 1/2 the allele Xi is passed to the offspring and with probability 1/2 it takes 
the value Xj. The second step takes mutation into account: after having taken the allele of 
the focal individual or its mate, the allele might change, from to 1 with probability /x_ or 
from 1 to with probability This corresponds to the Moran model with asymmetric 
mutations and is very similar to the influence process previously described for networks. In 
the framework of networks, the update of the node by keeping its own state or copying the 
state of a free neighbor corresponds to sexual reproduction. Copying the state of a frozen 
node represents mutation and depends on N and N±. 



However, the two processes are not quite the same: in the network dynamics the frozen 
nodes play a role only if the node 'decides' to copy a neighbor (probability 1 — p). Here 
mutation acts even if the allele is passed from the focal individual i to the offspring. The 
master equation that includes mutation is therefore slightly different. Using p — 1/2, which 
is appropriate for unbiased reproduction, we have: 
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P t+1 (m) = P t ( ro )|I(^)(l- |X+ ) + I^ 
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1 



2 V^- !, 

The first terms can be understood as follows: if the population has m individuals with 
allele A 1 at time t, it can remain that way in the next time step in several ways. First, if 
Xi = 1 (probability m/N) the offspring can keep the allele Ai if it gets it from individual % 
(probability 1/2) and it does not mutate after reproduction (probability 1 — Similarly, 
if Xi — (probability (N — m)/N) the offspring can keep the allele A if it gets it from 
individual i (probability 1/2) and does not mutate after reproduction (probability 1 — //_). 
The other terms have similar interpretations. 

This equation is greatly simplified when written in matrix form. We obtain 



Pt+i = UP t 



i 



2N{N -l)' 



-A 



(2) 



where the non-zero elements of A are given by 

A m , m = 2m(N - m) + Ni(N - m) + N m 
A m , m+ i = ~(m + l){N-m-l + N ) 
A m , m -i = ~(N - m + 1)(to - 1 + 

2n_{N - 1) 



with 



N 



1-2/Z 
1-2/Z 



(3) 
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and 



A* = • (4) 



This is identical to the original matrix A of the network dynamics! Therefore, all the 
known solutions of the network problem can be directly transferred to the genetic problem 
via the above relation between the mutation rates /i_ and fi + and the frozen nodes Nq and 
Ni. These solutions are described in the next section. 



V. EQUILIBRIUM DISTRIBUTION 

The cases Nq = or Ni = 0, corresponding to //+ = or //_ = 0, are trivial since 
all individuals in the population will eventually become identical, with allele Aq or A\ 
respectively. If Nq and N± are both zero the individuals will also eventually become identical, 
but the probability of each outcome, all A or all Ai, depend on the initial distribution of 
alleles in the population. 

If Nq and Ni are both non-zero, the probability of finding m nodes in state 1, or m 
individuals with allele Ai, in equilibrium is given by [l, 7, 9| 

a,*t Ar at \ r(iVi + k) T(N + Nq — k) 

where 

A(iV, iV , Nl ) - t{n + Nq + Ni) T{Ni) mQ y (6) 
is a normalization constant and T(x) is the Gamma function. This result is valid even if iVo 
and Ni are not integers. In a real network system, when N and Ni are integer numbers, 
the Gamma functions can be replaced by factorials. 

Notice that, because of the mutation rates (or frozen nodes), a particular realization of 
the dynamics will never stabilize in any state: the number of individuals with allele Ai will 
always change. The probability of finding the population with m alleles Ai, however, is 
independent of the time, and given by the expression above. One interesting feature of this 
solution is that for Nq — Ni — 1 we obtain p(m) = 1/(N + 1) for all values of m, meaning 
that all states are equally likely, no matter how large is the population. 

The mean value m = Yl m m Pi m ) an d the variance o"2 = Yl m ^ 2 p( m ) ~ ^ 2 can also be 
calculated explicitly. We obtain 
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FIG. 1. Asymptotic probability distribution for a network with N = 100 nodes and several values 
of iVo and N x . 



and 



NNtN {Ni + N + N) 



iviviiv 0l ivi -r ivp -r iv; , . 

° 2 ( Nl + N o ni + N 1 + N ) w 
Higher order correlations can also be calculated explicitly, but the results become progres- 
sively more complicated. 

Figures 1 and 2 show a few examples of the distribution p(m) for a network with N = 100 
and various values of N and Ni. 

If iV is very large p(m) peaks around mo and can be approximated by a Gaussian: 

(m — mo) 2 



with 



and 



p(m) = p exp - 

NNoN^N + No + N 1 ^ 1 ' 2 

1 



Po 



^7rA 



(x - x y 



In terms of the continuous variables x = m/N, no = Nq/N and n\ = N\/N we can also 
write 

p(x) = po exp 

with 
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25 2 



n ni(l + n + nx) 



1/2 



xq = m /N and po = I/V^ttS, showing that the width of the distribution goes to zero as iV 
goes to infinity, in agreement with the Hardy- Weinberg law. 
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FIG. 2. Equilibrium probability distribution for networks with different topologies. In all cases 
N = 100, iVo = N x = 5, t = 10, 000, and the number of simulations is 50, 000. The theoretical 
(red) curve is drawn with effective numbers of frozen nodes No e f = fNo and N\ e f = fN\: (a) 
random network No e f = N\ e f = 17; (b) scale-free No e f = N\ e f = 82; (c) regular 2-D lattice 
N 0ef = Nief = 140; (d) small world network N 0ef = N lef = 140. 

VI. STRUCTURED NETWORKS 

For networks that are not fully connected the effect of the frozen nodes is amplified. 
To see this we note that the probability that a free node copies a frozen node is Pi = 
(N + Ni)/(N + Ni + ki) where ki is the degree of the node. For fully connected networks 
ki = N — 1 and Pi = Pfc- F° r general networks an average value P av can be calculated 
by replacing ki by the average degree k av . We can then define effective numbers of frozen 
nodes, N 0e f and Nx e f, as being the values of iVo and Ni in Ppc for which P av = Pfc- This 
leads to 

N 0ef = fN , Ni e f = fNt (9) 

where / = (N— l)/k av - Corrections involving higher moments can be obtained by integrating 
Pi with the degree distribution and expanding around k av . 

Figure [2] shows examples of the equilibrium distribution for four different networks with 
iV = 100 and Nq = Nx = 5. Panel (a) shows the result for a random network constructed 
by connecting any pair of nodes with probability 0.3. In this case k av = 29.7 and / = 3.3. 
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The theoretical result was obtained with Eq. (JSJ) with No e f = N\ e f = 17. For a scale-free 



network (panel (b)) grown from an initial cluster o 



each following the preferential attachment rule 13j, / = 99/6 and the effective values of N 



6 nodes adding nodes with 3 connections 



and N\ are approximately 82. Panel (c) shows the probability distribution for a 2-D regular 
lattice with 10 x 10 nodes connected to nearest neighbors for which k a v = 3.6 (the nodes near 
the border have less than 4 links) / = 99/3.6 ~ 28. Finally, panel (d) shows a small world 



version of the regular lattice 



13], where 30 connections were randomly re-allocated, creating 



shortcuts between otherwise distant nodes. These results show that the approximate re- 
scaling of frozen nodes (or, equivalently, the mutation rates) is accurate for many network 
topologies. Still, extreme cases such as a star network do present different distributions and 
this is confirmed by simulations. 



VII. SPECIATION AND BIODIVERSITY 

In the last sections we derived two important theoretical results: (a) the connection be- 
tween the process of influence dynamics on networks and the Moran model; (b) the approxi- 
mate equilibrium distribution for structured networks, obtained by re-scaling the number of 
frozen nodes. We will show now that these two results allow us to infer important properties 
about the genetic evolution of spatially extended populations. 

It has been recently shown HQ that when mating is constrained by both spatial 
and genetic proximity between individuals, neutral evolution by drift alone might lead to 
speciation, i.e., to the spontaneous break up of the population into reproductively isolated 
clusters. Moreover, the patterns of abundance distributions generated by this mechanism 
are compatible with those observed in nature 



Neutral theories of biodiversity have become rather sophisticated 15] , heating the 



neutralist-selectionist debate 
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20]. In what follows we discuss the process of neutral 
speciation promoted by spatial and genetic constraints, termed topopatric speciation, in the 
light of the theory developed above. 

To make the analysis simpler we will restrict ourselves to the case of symmetric mutation 
rates, \i_ = /x + = \i or, equivalently, equal number of frozen nodes N = Ni = N z . In this 
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case the connection between mutations and frozen nodes simplifies to 

= (1Q) 

Let Pid be the probability that two individuals picked at random in the population have 
identical genes at equilibrium. This is given by the sum of the probabilities that their alleles 
are are both A\ or both Aq: 

P — n(m\ \m m-l I N—m N—m—l ~\ 

r id — l^ra=QP\" 1 ) [ N N-l N N-l J 

= 1 + N(hr) l° 2 + ( m ) 2 - N ( m )l ■ 

Using equations (|7j), ([8]) and (JTUJ) we obtain 

l + Nz l + 2f i(N-2) 
ld l + 2N z l + 2/i(2iV-3)' 1 ' 

The probability that the two individuals are different, which is the heterozigosity, is 

p - 1 - p - M N -V ~ 2 » N (U) 
ht ~ id ~ l + 2/i(2iV-3) ~ 1 + 4/iiV 1 } 

where the approximation holds for N » 1. 

Consider now a population in equilibrium where the N individuals have B independent 

nnr 



genes [Hi, [14|, |2l|, l23l-l25|. The average genetic distance between two individuals is 



B ( AfiN \ 
~2 VlT 



This expression provides a connection between the size of the population and the average 
genetic distance between individuals, which is a measure of diversity within the population. 
Two interesting relations can be derived from this equation: first, for given B and [i we can 
calculate the size Nq that corresponds to a particular average genetic distance (d) = G: 

N ° = MB-wy (14) 

Second, for given N and B we calculate the mutation rate \iq that corresponds to (d) = G: 

G 

» G = 2N(B - 2G) ■ (15) 

Notice that N G fi = Nfj, G . 
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When mating in panmictic populations is constrained by genetic proximity between in- 
dividuals, so that pairs whose genetic distance is larger than G are incompatible, the dis- 
tribution of genetic distances stays very close to (d ) = G, as if the genome had an effective 
size B e f = 2G. On the other hand, if mating is constrained by spatial proximity, the effec- 
tive mutation rate tends to increase. Indeed, spatial restriction in mating corresponds to 
influence processes on networks constructed over regular lattices, which amplifies the effect 
of frozen nodes and, therefore, of mutations. 

Consider a square lattice with L 2 nodes and periodic boundary conditions where each 
node is connected only to neighbors which are within a distance S from itself (measured 
in units of lattice spacing). Let iV be the number of individuals in the population, so that 
the density is p = N/L 2 . The area where an individual can look for a mate, its 'mating 
neighborhood', is approximately tcS 2 , which is also the average degree k av of the network. 

According to our discussion in section IVIj this can be modeled as fully connected network 
with effective number of frozen nodes 

N — 1 iV 
N ef = fN z = —— N z « — -N z . (16) 

K av 7TD 

The corresponding effective mutation rate is obtained from (TIP]) 

2 M {N - 1) 



N, 



ef 1 - 2/v 
which gives 

^'i + W-ifi + W' (17) 

Note that // e / ->■ 1/2 if fj,f » 1. 

When mating between individuals is constrained by their spatial distance, as measured by 
the parameter S, the effective mutation rate (JT7j) can be dramatically enhanced with respect 
to a panmictic population. This, in turn, increases the average genetic distance between 
individuals, which approaches B/2 for large populations and fixed k av (corresponding to 
large values of N z ). The distribution of genetic distances approaches a broad symmetric 
distribution. 

On the other hand, if mating is constrained only by the genetic distance between indi- 
viduals, the distribution of genetic distances shrinks to about G. This corresponds to an 
effective shrink in genome size from B to 2G. 

14 



When both spatial and genetic restrictions are present, as in 11], the population feels a 
large effective mutation rate, tending to spread out the genome distribution. On the other 
hand, the individuals are compelled by the mating condition to stay genetically close to each 
other. The only stable outcome of these opposing forces is the formation of local groups 
where (d) < G within the group but (d) > G among groups. This characterizes the groups 
as reproductively isolated from each other and, therefore, as separate species. 

The average number of individuals in each group is given approximately by Nq (|T4|) . 
which is usually much smaller than N. This also implies that the individuals within groups 
are highly connected to each other, so that / ~ 1 and \i e f ~ fi, restoring the equilibrium of 
the system. 

The conditions for speciation can be estimated as follows. When S is very large, the 
effect of the genetic mating restriction is to reduce the effective size of the genome, B e f, 
from B to 2G, so that, from equation (|T3|) . (d) is at most G. As S is reduced, the effective 
mutation rate increases and new genes are incorporated into the effective genome, increasing 
the average genetic distance between individuals. When (d ) becomes larger than about 2G 
the population can no longer hold itself together and splits. This has been confirmed by 
numerical simulations. We write 

B ef = 2G + (B - 2G)V (18) 

where V is the probability that a new gene is fixed into the effective genome. 

V goes to zero for large values of S and reaches one for small S. It must depend only 
on the mutation rate fi, genome length B and the size of the local mating population 
Ns = nS 2 p = tiS 2 N/L 2 . This local mating population has to be at least 2, otherwise 
mating is not possible. More generally, if the minimum number of potential mates for 
reproduction is P we can define the minimum S by TtS 2 nin p = P, or 

S min = Ly/P/nN. (19) 

V must be small if the local mating population is large. On the other hand, it must 
increase with the mutation rate and size of the genome. We may therefore write the ansatz 



V = exp 



k(S - S mm fN/L 2 
Bfi 
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single species 




FIG. 3. Parameter region where speciation is possible according with equation (]22p . In this example 
N = 2000, [i = 0.001, B = 125, L = 128 and P = 8 {S min = 4.6) and 7 = 6.6 (see [ij). 



or 



V = exp 



7T 2 (5- 5„ 



) 4 N 2 



(20) 



7 4 L 4 5 2 /i 2 

where the constant of proportionality c is rewritten by as 7 -4 for convenience. The expo 
nential dependence of V on the square of Ns/Bfi is suggested by numerical simulations. 
The condition for speciation is 

B ef ( 4/ie/iV 



(d) 



>2G. 



2 \l + 4 f i ef N / 

Since the fiN is usually of order 1 in most simulations, and // e / >> A*, the factor Afi e jN/ (1 + 
AfMefN) can be safely approximated by 1. Using equations (fl~8l) and ( 1201) we obtain 

vr 2 (5 - 5 mm ) 4 iV 2 /£?-2G 

^ lO! 



7 4 L 4 /i 2 5 2 



2G 



or 



5 1 ^$ S'min "I" 7-^ 

Inverting this equation we obtain 

G < 



log 

5/2 



5-2G 
2G 



1/4 



1 + exp 



^N^(s-s min y 



G C (S) 



(21) 



(22) 
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which gives the minimum value of G for a given S. 

Equation (|2ip gives the maximum size of the mating neighborhood for which speciation 
is possible. This analytical result describes the dependence of speciation on 6 model param- 
eters: B, G, fi, P, L and N. It provides a very good quantitative estimate for the parameter 
region where speciation is possible, as illustrated in figure El The result also incorporates 
cutoffs at G = B/4 and at S m i n , which are in agreement with numerical simulations 11]. 
Furthermore it also gives the scaling dependence of S c on these various parameters. In par- 
ticular, it predicts _speciation at large values of S if B is sufficiently large. This corroborates 



the results in 



21 



22] but shows that such space-independent speciation occurs only for very 



large values of B, since S increases with B 1 ! 2 . 



17, 



Our analytical result constitute an important addition to the simulations presented in 
and contribute to the understanding of the significant role of drift in speciation 
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261 ] . Equation ( 12 ip identifies the combination of parameters that makes this possible. 



For example, low mutation rates, that hinder speciation, can be compensated by a large 
number of participating genes or by low population density. 



VIII. CONCLUSIONS 

The process of speciation underlies the creation of the tree of life. Fossil records and 
molecular analysis allow the construction of detailed phylogenetic trees linking species to 
their ancestors, identifying the branching points of speciation. The way speciation occurred 
in each case, however, is rarely known with certainty and several mechanisms have been 



considered. A recently proposed mechanism of speciation 11] demonstrated that a spatially 
extended population can break up spontaneously into species when subjected to mutations 
and to spatial and genetic mating restrictions, even in the absence of natural selection. 
Numerical simulations have shown that this mechanism, termed topopatry, occurs for a 
restricted range of parameters, that include population size N, mutation rate \x and the 
parameters S and G controlling the spatial and genetic mating restrictions. 

In this paper we have introduced a mapping of genetic dynamics in an evolving population 
onto the dynamics of influence on a network, and used this mapping to analytically study the 
process of topopatric speciation. This mapping gives, to our knowledge, the first complete 
solution of the Moran model, providing an elegant representation of the complete set of 

17 



eigenvectors of the problem. 

We have shown that, while fully connected networks correspond to panmictic populations, 
certain structured networks can be mapped into dynamic spatially extended populations. 
Moreover, the mapping shows that limiting mating to a fraction of the total population 
by network connections increases the effective mutation rate as compared to the panmictic 
case, and increases the genetic diversity of the population. By extending the model from one 
to multiple independent biallelic genes, we have shown that a genetic restriction on mating 
decreases the effective size of the genome, decreasing diversity. These opposing forces are 
resolved not by compromise but by pattern formation, breaking up the population into 
multiple species. This process, and its dependence on the most relevant characteristics of 
the population, is accurately described by equation ff22]) . This equation provides a new 
and important tool to understand neutral speciation, revealing explicitly the relationships 
among the parameters involved in the process, and the interplay of genetic processes whose 
opposition leads to spontaneous speciation. 
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